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Abstract — This paper introduces a network coding-based pro- 
tection scheme against single and multiple link failures. The 
proposed strategy ensures that in a connection, each node receives 
two copies of the same data unit: one copy on the working 
circuit, and a second copy that can be extracted from linear 
combinations of data units transmitted on a shared protection 
path. This guarantees instantaneous recovery of data units upon 
the failure of a working circuit. The strategy can be implemented 
at an overlay layer, which makes its deployment simple and 
scalable. While the proposed strategy is similar in spirit to the 
work of Kamal '07 & '10, there are significant differences. In 
particular, it provides protection against multiple link failures. 
The new scheme is simpler, less expensive, and does not require 
the synchronization required by the original scheme. The sharing 
of the protection circuit by a number of connections is the key to 
the reduction of the cost of protection. The paper also conducts 
a comparison of the cost of the proposed scheme to the 1+1 and 
shared backup path protection (SBPP) strategies, and establishes 
the benefits of our strategy. 

Index Terms — Network protection, Overlay protection, Net- 
work coding, Survivability 

I. Introduction 

Research on techniques for providing protection to networks 
against link and node failures has received significant attention 
lHJ. Protection, which is a proactive technique, refers to 
reserving backup resources in anticipation of failures, such that 
when a failure takes place, the pre-provisioned backup circuits 
are used to reroute the traffic affected by the failure. Several 
protection techniques are well known, e.g., in 1+1 protection, 
the connection traffic is simultaneously transmitted on two link 
disjoint paths. The receiver, picks the path with the stronger 
signal. On the other hand in 1:1 protection, transmission on 
the backup path only takes place in the case of failure. Clearly, 
1+1 protection provides instantaneous recovery from failure, 
at increased cost. However, the cost of protection circuits is 
at least equal to the cost of the working circuits, and typically 
exceeds it. To reduce the cost of protection circuits, 1:1 
protection has been extended to 1:N protection, in which one 
backup circuit is used to protect N working circuits. However, 
failure detection and data rerouting are still needed, which 
may slow down the recovery process. In order to reduce the 
cost of protection, while still providing instantaneous recovery, 
references lfl3l . lfT31 proposed the sharing of one set of 
protection circuits by a number of working circuits, such that 
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each receiver in a connection is able to receive two copies of 
the same data unit: one on the working circuit, and another one 
from the protection circuit. Therefore, when a working circuit 
fails, another copy is readily available from the protection 
circuit. The sharing of the protection circuit was implemented 
by transmitting data units such that they are linearly combined 
inside the network, using the technique of network coding 
lfl6l . Two linear combinations are formed and transmitted 
in two opposite directions on a p-Cycle |4|. We refer to 
this technique as 1+N protection, since one set of protection 
circuits is used to simultaneously protect a number of working 
circuits. The technique was generalized for protection against 
multiple failures in lfT4l . 

In this paper, we propose a new method for protection 
against multiple failures that is related to the techniques of 
fl31 . fl4l . Our overall objective is still the same; however, 
the proposed scheme improves upon the previous techniques 
in several aspects. First, instead of cycles, we use paths 
to carry the linear combinations. This reduces the cost of 
implementation even further, since in the worst case the 
path can be implemented using the cycle less one segment 
(that may consist of several links). Moreover, a path may 
be feasible, while a cycle may not. Second, each linear 
combination includes data units transmitted from the same 
round, as opposed to transmitting data units from different 
rounds as proposed in lfT31 . This simplifies the implementation 
and synchronization between nodes. This aspect is especially 
important when considering a large number of protection 
paths, since synchronization becomes a critical issue in this 
case. The protocol implementation is therefore self-clocked 
since data units at the heads of the local buffers in each node 
are combined provided that they belong to the same round. 
Overall, these improvements result in a simple and scalable 
protocol that can be implemented at the overlay layer. The 
paper also includes details about implementing the proposed 
strategy. A network coding scheme to protect against adversary 
errors and failures under a similar model is proposed in Q, 
in which more protection resources are required. 

This paper is organized as follows. In Section HI] we intro- 
duce our network model and assumptions. In Section [Till we 
introduce the modified technique for protection against single 
failures. Implementation issues are discussed in Section [TV] 
In Section [V] we present a generalization of this technique for 
protecting against multiple failures. The encoding coefficient 
assignment is discussed in Section [VI] In Section IVHI we 
present an integer linear programming formulation to provision 
paths to protect against single failures. Section [Villi provides 



some results on the cost of implementing the proposed tech- 
nique, and compares it to 1+1 protection and SBPP. Section 
|IX] concludes this paper with a few remarks. 

II. Model and Assumptions 

In this section we introduce our network model and the 
operational assumptions. We also define a number of variables 
and parameters which will be used throughout the paper. 

A. Network Model 

We assume that the network is represented by an undirected 
graph, G(V, E), where V is the set of nodes and E is the 
set of edges. Each node corresponds to a switching node, 
e.g., a router, a switch or a crossconnect. Network users 
access the network by connecting to input ports of such 
nodes, possibly through multiplexing devices. Each undirected 
edge corresponds to two transmission links, e.g., fibers, which 
carry data in two opposite directions. The capacity of each 
link is a multiple of a basic transmission unit, which can be 
wavelengths, or smaller tributaries, such as DS-3, or OC-3. In 
this paper, we do not impose an upper limit on the capacity of 
a link, and we assume that it carries a sufficiently large number 
of basic tributaries, i.e., we consider the uncapacitated case. 

In order to protect against single link failures, the network 
graph needs to be at least 2-connected. That is, between each 
pair of nodes, there needs to be at least two link disjoint 
paths. The number of protection paths, and the connections 
protected by each of these paths depends on the connections 
and their end points, as well as the network graph. An example 
of connection protection in NSFNET will be given in Section 
IILTI In general, for protection against M link failures, the graph 
needs to be (M + l)-connected. 

Since providing protection to connections will require the 
use of finite field arithmetic, these functions are better imple- 
mented in the electronic domain. Therefore, we assume that 
protection is provided at a layer that is above the optical layer, 
and this is why we refer to this type of protection as overlay 
protection. 

B. Operational Assumptions 

We make the following operational assumptions: 

1) The protection is at the connection level, and it is 
assumed that all connections that are protected together 
will have the same transport capacity, which is the max- 
imum bit rate that has to be handled by the connection. 
We refer to this transport capacity as £0. 

2) All connections are bidirectional. 

3) Paths used by connections that are jointly protected are 
link disjoint. 

4) A set of connections will be protected together by a 
protection path. The protection path is bidirectional, 
and it passes through all end nodes of the protected 

1 Throughout this paper we assume that all connections that are protected 
together have the same transport capacity. The case of unequal transport 
capacities can also be handled, but will not be addressed in this paper. 



connections. The protection path is also link disjoint 
from the paths used by the protected connections. 

5) Links of the protection path protecting a set of connec- 
tions have the same capacity of these connections, i.e., 
B. 

6) Segments of the protection path are terminated at each 
connection end node on the path. The data received on 
the protection path segment is processed, and retrans- 
mitted on the outgoing port, except for the two extreme 
nodes on the protection path. 

7) Data units are fixed and equal in size. 

8) Nodes are equipped with sufficiently large buffers. The 
upper bound on buffer sizes will be derived in Section 

ED 

9) When a link carrying active (working) circuits fails, the 
receiving end of the link receives empty data units. We 
regard this to be a data unit containing all zeroes. 

10) The system works in time slots. In each time slot a new 
data unit is transmitted by each end node of a connection 
on its primary patf@. In addition, this end node also 
transmits a data unit in each direction on the protection 
path. The exact specification of the protocol, and the 
data unit is given later. 

11) The amount of time consumed in solving a system of 
equations is negligible in comparison to the length of a 
time slot. This ensures that the buffers are stable^ 

The symbols used in this paper are listed in Table HI and 
will be further explained within the text. The upper half of the 
table defines symbols which relate to the working, or primary 
connections, and the lower half introduces the symbols used 
in the protection circuits. All operations in this paper are over 
the finite field GF(2 m ) where m is the length of the data 
unit in bits. It should be noted that all addition operations (+) 
over GF(2 m ) can be simply performed by bitwise XOR's. In 
fact, for protection against single-link failures we only require 
addition operations, which justifies the last assumption above. 

III. 1+N Protection Against Single Link Failures 

In this section we introduce our strategy for implementing 
network coding-based protection against single link failures. 

Consider a set of N bidirectional, unicast connections, where 
the number of connections is given by N = \N\. Connection 
i <-> j is between nodes Si and Tj. Nodes Si and Tj belong 
to the two ordered sets S and T, respectively. Data units are 
transmitted by nodes in S and T in rounds, such that the data 
unit transmitted from Si to Tj in round n is denoted by d,(n), 
and the data unit transmitted from Tj to Si in the same round 
is denoted by Uj (n) 0. The data units received by nodes Si 
and Tj are denoted by Uj and di, respectively, and can be zero 

2 The terms primary and working circuits, or paths, will be used interchange- 
ably. 

3 Typically, a single connection will have a bit rate on the order of 10's or 
100's of Mbps that is much lower than the capacity of a fiber or a wavelength. 
Therefore, we assume that the processing elements of a switching node will 
be able to process the data units within the transmission time of one data unit. 

4 For simplicity, the round number, n, may be dropped when it is obvious. 
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TABLE I 

List of symbols: Upper half are symbols used for working 
paths, and lower half are symbols for protection paths. 



Symbol 


Meaning 


N 


set of connections to be protected 


N 


number of connections = N 


S, T 


two disjoint ordered sets of communicating nodes, 




such that a node in S communicates with a node 




in T 




sets of connection end nodes protected by Pj. 


Si, Tj 


nodes in S and T, respectively 


di, Uj 


data units sent by nodes Si and Tj, respectively 


di, uj 


data units sent by nodes Si and Tj, respectively, 




on the primary paths, which are received by their 




respective receiver nodes 


T(Si) 


node in 7" transmitting to and receiving from Si 




node in S transmitting to and receiving from Tj 


B 


the capacity protected by the protection path 


n 


round number 




T2 T3 



Fig. 1. An example of enumerating the nodes in five connections. Node T5 
is the first node to be encountered while traversing S, which communicates 
with a node in S that has already been enumerated (S^). 



M 

P (or P fc ) 
P 

S, T 

a(Si)(<r(T 3 -)) 

a-^SiXa-^Tj)) 

r(5 i )(r(T i )) 

T-^SiXT- 1 ^-)) 
Xw(xp) 

F s (Si)(F T (Si)) 

a i++j,k 
A 



Tj to Si, respectively. The basic idea for receiving a second 
copy of data Uj by node Si, for example, is to receive on two 
opposite directions on the protection path, P, the signals given 
by the following two equations, where all data units belong to 
the same round, n: 



total number of failures to be protected against 
(M = 1 in Section [En), 
bidirectional path used for protection 
set of protection paths 

unidirectional paths of P started by Si and Ti, 
respectively 

the next node downstream from Si (respectively 
Tj) on S 

the next node upstream from Si (respectively Tj) 
on S 

the next node downstream from Si (respectively 
Tj) on T 

the next node upstream from Si (respectively Tj) 
on T 

delay over working (protection) path 
buffers at node Si used for transmission on the S 
(T) paths 

scaling coefficient used for connection between Si 
and Tj on Pj. 

The data unit transmitted on link e 6 S ( e g T between Si and Tj fails, then Uj can be recovered by Si by 

respectively) 

The total number of protection paths, i.e., |P| 



k, S fc GA k, T k e~B 

Uj + u k + dk 

k, TfcGB k, S fe GA 



(1) 

(2) 



where A and B are disjoint subsets of nodes in the ordered 
set of nodes S and T, respectively, such that a node in A 
communicates with a node in B, and vice versa. If the link 



in the case of a failure on the primary circuit between Si and 

The two ordered sets, S = (Si, S2, ■ ■ ■ , Sn) and T = 
(Ti, T2, . . . , Tn) are of equal lengths, N, which is the number 
of connections that are jointly protected. If two nodes com- 
municate, then they must be in different ordered sets. These 
two ordered sets define the order in which the protection path, 
P, traverses the connections' end nodes. The ordered set of 
nodes in S is enumerated in one direction, and the ordered 
set of nodes in T is enumerated in the opposite direction on 
the path. The nodes are enumerated such that one of the two 
end nodes of P is labeled S\. Proceeding on P and inspecting 
the next node, if the node does not communicate with a node 
that has already been enumerated, it will be the next node 
in S, using ascending indices for Si. Otherwise, it will be in 
T, using descending indices for Ti. Therefore, node T\ will 
always be the other end node on P. The example in Figure Q] 
shows how ten nodes, in five connections are assigned to S 
and T. The bidirectional protection path is shown as a dashed 
line. 

Under normal working conditions the working circuit will 
be used to deliver di and Uj data units from Si to Tj and from 



simply adding equations ([T]i and (|2}. 

We now outline the steps involved in the construction 
of the primary/protection paths and the encoding/decoding 
operations at the individual nodes. 

A. Protection Path Construction and Node Enumeration 

1) Find a bidirectional patfH, P, that goes through all the 
end nodes of the connections in N. P consists of two 
unidirectional paths in opposite directions. These two 
unidirectional paths do not have to traverse the same 
links, but must traverse the nodes in the opposite order. 
One of these paths will be referred to as S and the other 
one as T. 

2) Given the set of nodes in all N connections which are 
to be protected together, construct the ordered sets of 
nodes, S and T, as explained above 

3) A node Si in S (Tj in T) transmits di (uj) data units to 
a node in T (S) on the primary path, which is received 
as di (uj). 

4) Transmissions on the two unidirectional paths S and 
T are in rounds, and are started by nodes S\ and Ti, 

5 The path is not necessarily a simple path, i.e., vertices and links may be 
repeated. We make this assumption in order to allow the implementation of 
our proposed scheme in networks where some nodes have a nodal degree of 
two. Although the graph theoretic name for this type of paths is a walk, we 
continue to use the term path for ease of notation and description. 
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Fig. 2. An example of provisioning and protecting four connections on 
NSFNET. 

respectively. All the processing of data units occurs 
between data units belonging to the same round. 
It is to be noted that it may not be possible to protect all 
connections together, and therefore it would be necessary 
to partition the set of connections, and protect connections 
in each partition together. We illustrate this point using the 
example shown in Figure [2] where there are four connections 
(shown using bold lines) that are provisioned on NSFNET: 
Ci = (3,12), C 2 = (4,10), C 3 = (0,7) and C 4 - (1,11). 
It is not possible to protect all four connections together 
using one protection path that is link disjoint from all four 
connections. Therefore, in this example, we use two protection 
paths: one protection path (3,4,5,8,10,12) protecting C\ and 
C 2 , and is shown in dashed lines; and another protection path 
(0,1,3,4,6,7,10,13,1 1) protecting C3 and C4, and is shown in 
dotted lines. Notice that all connections that are protected 
together, and their protection path are link disjoint. The end 
nodes in C\ and C 2 are labeled Si, S2, T\ and T2, while 
the end nodes in C3 and C4 are labeled S[, S 2 , T[ and T 2 , 
respectively. In the above example, it is assumed that each 
connection is established at an electronic layer, i.e., an overlay 
layer above the physical layer. For example, the working path 
of a connection can be routed and established as an MPLS 
Label Switched Path (LSP), which can be explicitly routed in 
the network, as shown in the figure, and therefore the paths of 
the connections which are jointly protected, e.g., C\ and C 2 in 
the above example, can be made link disjoint. However, when 
it comes to the protection path, since the data units transmitted 
on this path need to be processed, the protection path can be 
provisioned as segments, where each segment is an MPLS 
LSP which is explicitly routed. For the example of Figure [2] 
the protection path protecting connections C\ and C 2 can be 
provisioned as three MPLS LSPs, namely, (3,4), (4,5,8,10) and 
(10,12). 

B. Encoding Operations on S and T 

The network encoding operation is executed by each node 
in S and 7~. To facilitate the specification of the encoding 
protocol we first define the following. 



• T(Si): node in T transmitting to and receiving from Si, 
e.g. in Fig. 1, T(S 1 ) = T 2 . 

• S(Tj): node in S transmitting to and receiving from Tj. 
m a(Si)/a(Tj): the next node downstream from Si (respec- 
tively Tj) on S, e.g., in Fig.l, a(S 2 ) = S3. 

• a^ 1 (Si) / a^ 1 (Tj): the next node upstream from Si (re- 
spectively Tj) on S, e.g., in Fig.l, cr _1 (T 5 ) = S4. 

• r(Si) / r(Tj): the next node downstream from Si (respec- 
tively Tj) on T, e.g., in Fig. 1, r(T 4 ) = S 5 . 

• t^ 1 (Si) / t^ 1 (Tj): the next node upstream from Si (re- 
spectively Tj) on T,e.g., in Fig.l, t- 1 (S 5 ) = T 4 . 

We denote the data unit transmitted on link e £ S by y e and 
the data unit transmitted on link e 6 T by z e . Assume that 
nodes Si and Tj are in the same connection. The encoding 
operations work as follows, where all data units belong to the 
same round. 

1) Encoding operations at Si. The node Si has access to 
data units d t (that it generated) and data unit Uj received 
on the primary path from Tj. 

a) It computes y a - 1 (s i )^s i + (di + Uj) an d sends it 
on the link Si — > cr(Si); i.e. 

VSi->-a(S t ) = Va-^Si^Si + K + Uj). 

b) It computes z r -i( + (di + itj) and sends it 
on the link Si — > r(S^); i.e. 

z S z ^r(Si) = ^t-1(S,)->S, + (di + Uj). 

2) Encoding operations at Tj. The node Tj has access to 
data units Uj (that it generated) and data unit di received 
on the primary path from Si. 

a) It computes y a -irr )^T- + (d% + u j) an d sends it 
on the link Tj -> a(Tj); i.e. 

y T] ^a(T 3 ) = Va-^T^^Tj + (di + Uj) 

b) It computes z t -ut-)-*t- + (di + u j) and sends it 
on the link Tj -> t(T 3 )1 i.e. 

zt 3 ^t(t 3 ) = Zt-ht^Tj + (di + Uj) 

An example in which three nodes perform this procedure in 
the absence of failures is shown in Figure [3] 

Consider S' C S and let Af(S') represent the subset of 
nodes in T that have a primary path connection to the nodes 
in S' (similar notation shall be used for a subset T 1 C T). 
Let Ds(Si) and Us(Si) represent the set of downstream and 
upstream nodes of Si on the protection path S (similar notation 
shall be used for the protection path T). When all nodes in S 
and T have performed their encoding operations, the signals 
received at a node Si on the S and T paths, respectively, are 
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ul+dl+ I "l+dl+ 
i.2+d2 ' "2+d2+ 
u3+d3 



Fig. 3. Example of three nodes performing the encoding procedure. Note 
that the addition (bitwise XOR) of two copies of the same data unit, e.g., di 
and di, removes both of them. 



as follows 

Y dk+ Y ^ 

{k-.s k eu s (Si)ns} {k-.T k £j^(u s {Si)nS)} 
v « ' 

From nodes upstream of Si on S in S 
+ Y Uk + Y aIld ^ 

{k:T k eu s (Si)nr} {k:S k eAr(Us(Si)nT)} 



From nodes upstream of Si on S in T 

z r- 1 {S z )^S i 

= 2J dk+ Y " fc 

{k:S k eu-r(Si)ns} {k:T k ^(u T (s,)ns)} 



From nodes upstream of Si on T in S 

+ Y uk + Y dk (4) 

{k:T k eu-T{s,)nT} {fe:S fc eAA(i/T(s s )nr)} 



From nodes upstream of Si on T in T 

Similar equations can be derived for node Tj. 

C. Recovery from failures 

The encoding operations described in Subsection lHI-Bl allow 
the recovery of a second copy of the same data unit transmitted 
on the working circuit, hence protecting against single link 
failures. To illustrate this, suppose that the primary path 
between nodes Si and Tj fails. In this case, Si does not 
receive Uj on the primary path, and it receives Uj = instead. 
Moreover, di — 0. However, Si can recover uj by adding 
equations (01 and (0). In particular node Si computes 

y<r-i(Si)^Si+ Z T-HSi)->Si= Y dk+ Y Uk 

{k:S k £S\{Si}} {k:T k dT} 

+ Y ^k + Y dk 

{k:T k eT\{Tj}} {k:S k ES} 

= di + Ui 



Uj (since di =0.) 



(5) 



Similarly, Tj can recover di by adding the values it obtains 
over S and T . For example, if the working path between S2 
and T2 in Figure [3] fails, then at node S2 adding the signal 
received on S to the signal received on T, then ui can be 
recovered, since Ti generated u 2 . Also, node T2 adds the 
signals on S and T to recover di. 

Notice that the reception of a second copy of and di at 
S2 and Ti, respectively, when there are no failures, requires 
the addition of the di and ui signals generated by the same 
nodes, respectively. 

As a more general example, consider the case in Figure Q] 
Node 55, for example, will receive the following signal on S: 

(di + fi a ) + (d 2 + u 5 ) + (da + u 1 ) + (d 4 + u 4 ) + (u 5 + d 2 ), (6) 

and will receive the following on T: 

(ui + 4) + (ua + di) + (u 3 + d 5 ) + (u 4 + di). (7) 

If the link between ^5 and T3 fails, then ^5 = 0, and adding 
equations (O and (]7]i will recover 113 at S5. 

IV. Implementation Issues 

In this subsection we address a number of practical imple- 
mentation issues. 

A. Round Numbers 

Since linear combinations include packets belonging to 
the same round number, the packet header should include a 
round number field. The field is initially reset to zero, and is 
updated independently by each node when it generates and 
sends a new packet on the working circuit. Note that there 
will be a delay before the linear combination propagating on 
S and T reaches a given node. For example, in Figure [3] 
assuming that all nodes started transmission at time 0, node S3 
shall receive the combination corresponding to round over 
S, di(0) + Mi(0) + d 2 (0) + ui(0) after a delay corresponding 
to the propagation delay between nodes Si and S3, in addition 
to the processing and transmission times at nodes Si and S2. 
However since the received data unit shall contain the round 
number 0, it shall be combined with the data unit generated 
by S3 at time slot 0. 

The size of the round number field depends on the delay 
of the protection path, including processing and transmission 
times, as well as propagation time, and the working circuit 
delay. It is reasonable to assume that the delay of any working 
circuit is shorter than that of the protection circuit; otherwise, 
the protection path could have been used as a working path. 
Thus, when a data unit on the protection path corresponding 
to a particular round number reaches a given node, the data 
unit of that round number would have already been received 
on the primary path of the node. 

In this case, it is straightforward to see that once a data 
unit is transmitted on the working circuit, then it will take no 
more than twice the delay of the protection path to recover 
the backup copy of this data unit by the receiver. Therefore, 
round numbers can then be reused. Based on this argument, 
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the size of the set of required unique round numbers is upper 
bounded by 2a, where 

^ r XP -j 

(Protection data unit size in bits)/B 
Xp in the above equation is the delay over the protection 
circuit, and B is the transport capacity of the protection circuit, 
which, as stated in Section III-BI is taken as the maximum 
over all the transport capacities of the protected connections. 
A sufficiently long round number field will require no more 
than log 2 (2a) bits. 

B. Synchronization 

An important issue is node synchronization to rounds. This 
can be achieved using a number of strategies. A simple strategy 
for initialization and synchronization is the following: 

• In addition to buffers used to store transmitted and 
received data units, each node Si G S has two buffers, 
Fs(Si) and FT(Si), which are used for transmissions on 
the S and T paths, respectively. Node Tj 6 T also has 
similar buffers, F s (Tj) and F T (T 3 ). 

• Node Si starts the transmission of di(0) on the working 
circuit to T(Si). When Si receives uj^s^O), it forms 
di(0) + ut{s 1 ) (0) and transmits it on the outgoing link in 
S. Similarly, node T\ will transmit iii(O) on the working 
circuit, and ui(0) +ds(T 1 ) (0) on the outgoing link in T. 

> Node Si, for i > 0, will buffer the combinations received 
on S in Fs(Si). Assume that the combination with the 
smallest round number buffered in Fs(Si) (i.e., head 
of buffer) corresponds to round number n. When Si 
transmits di(n) and receives UT(Si){n), then it adds 
those data units to the combination with the smallest 
round number in Fs(Si) and transmits the combination 
on S. The combination with round number n is then 
purged from Fs(Si). Similar operations are performed 
on F T (Si), F s (Tj) and F T (Tj). Note that purging of 
the data unit from the buffer only implies that the 
combination corresponding to round n has been sent and 
should not be sent again. However node Si needs to 
ensure that it saves the value of the data unit received on 
S as long as needed for it to be able to decode U7vg.)(n) 
if needed. An illustration of the use of those buffers is 
shown in Figure |4] 

C. Buffer Size 

Assuming that all nodes start transmitting simultaneously, 
then all nodes would have decoded the data units correspond- 
ing to a given round number in a time that does not exceed 



Xp 



max Xi 

Kw<N 



where Xw is the delay over working path w. 

Based on this, the following upper bounds on buffer sizes 
can be established: 

• The transmit buffer, as well as the Fs and Ft buffers are 
upper bounded by 

Xp + maxi< w < N Xw n 



Fsl.S) 




"jO) 

Primary path fl („ + 1) 

receive butter J v 

9 
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Fs(.S',) 
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Primary path 
receive buffer 
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Fig. 4. An illustration of the use of node buffer Fs(Si). (a) Shows the status 
of the buffers before data unit at round n has been processed, (b) Shows the 
status of the buffers after the data unit at round n has been processed. Note 
that the data units corresponding to round n have been purged from both 
Fs(Si) and the primary path receive buffer. The operation of other buffers 
is similar. 



This is because it will take Xw units of time over the path 
w used by the connection S(Ti) O T\ to receive dsir^, 
and then start transmission on the T path. An additional 
Xp units of time is required for the first combination to 
reach Si. The numerator in the above equation is the 
maximum of this delay. 
• The receive buffer is upper bounded by 

j- Xp + maxi^^jv Xw - mirii^^Ar Xw -, 
Data unit size in bits/ B 

The numerator in the above equation is derived using 
arguments similar to the transmit buffer, except that for 
the first data unit to be received, it will have to encounter 
the delay over the working circuit; hence, the subtraction 
of the minimum such delay. 

V. Protection against multiple faults 

We now consider the situation when protection against 
multiple (more than one) link failures is required. In this case it 
is intuitively clear that a given primary path connection needs 
to be protected by multiple bi-directional protection paths. To 
see this we first analyze the sum of the signals received on 
S and T for a node Si that has a connection to node Tj 
when the primary paths Si O Tj and 5V f-> Tji protected by 
the same protection path are in failure. In this case we have 



0. Therefore, at node Si 

{k:S k £S\{Si}} 

{k:T k eT\{T j }} 
(dy + Uj<) + Uj. 



we have, 

{k:T k £T} 
{fe:S fc eS} 



Data unit size in bits/ B 



Note that node Si is only interested in the data unit Uj but it 
can only recover the sum of Uj and the term (aV + Uji), in 
which it is not interested. 

We now demonstrate that if a given connection is protected 
by multiple protection paths, a modification of the protocol 
presented in Section lTlI-Bl can enable the nodes to recover from 
multiple failures. In the modified protocol a node multiplies 
the sum of its own data unit and the data unit received over 
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its primary path by an appropriately chosen scaling coefficient 
before adding it to the signals on the protection path. The 
scheme in Section IIII-BI can be considered to be a special 
case of this protocol when the scaling coefficient is 1 (i.e., the 
identity element over GF(2 m )). 

It is important to note that in contrast to the approach 
presented in [14], this protocol does not require any syn- 
chronization between the operation of the different protection 
paths. 

As before, suppose that there are N bi-directional unicast 
connections that are to be protected against the failure of any 
M links, for M < N. These connections are now protected 
by K protection paths Pj., k = 1, . . . , K. Protection path Pf. 
passes through all nodes Sk Q S and Tk C T where the 
nodes in Sk communicate bi-directionally with the nodes in 
Tk- Note that Uf =1 5 fc = S and U% =1 Tk = T. The ordered 
sets Sk and Si are not necessarily disjoint for I ^ k, i.e., a 
primary path can be protected by different protection paths. 
However, if two protection paths are used to protect the same 
working connection, then they must be link disjoint. 

A. Modified Encoding Operation 

Assume that nodes 5, and Tj are protected by the protection 
path Pfc. The encoding operations performed by Si and Tj 
for path Pk are explained below (the operations for other 
protection paths are similar). In the presentation below we 
shall use the notation cr(Si), a^ 1 (Si), r(Si), r^ 1 (Si) to be 
defined implicitly over the protection path P&. Similar notation 
is used for Tj. 

The nodes Si and Tj initially agree on a value of the scaling 
coefficient denoted a^j.k £ GF(2 m ). The subscript i <H> j,k 
denotes that the scaling coefficient is used for connection Si 
to Tj over protection path P&. 

1) Encoding operations at Si. The node Si has access to 
data units di (that it generated) and data unit Uj received 
on the primary path from Tj. 

a) It computes y <J -i( S .^ St + oti+*j,k(di + Uj) and 
sends it on the link Si — > cr(Si); i.e. 

yS^a(Si) = y<T-i(S s )^Si + a»++j,fc(dj + Uj). 

b) It computes z T -i/g.\^g. + a^j,k(di + Uj) and 
sends it on the link Sj — > r(Si); i.e. 

2) Encoding operations at Tj. The node Tj has access to 
data units Uj (that it generated) and data unit di received 
on the primary path from Si. 

a) It computes 2/ CT -i( Tj )^T 3 + ai++j,k(di + uj) and 
sends it on the link Tj — > cr(Tj); i.e. 

VTj-HrtTj) = Va-HT^^Tj + (Xi^j,k(di + Uj) 

b) It computes z t -i(t-)->-t- + a i^j,k(di + Uj) and 
sends it on the link Tj — > r(Tj); i.e. 

ZTj-^r(Tj) = Z T -i^)^Tj + Oli^j t k{di + Uj) 



It should be clear that we can find expressions similar to the 
ones in (01 and (0|i in this case as well. 

B. Recovery from failures 

Suppose that the primary paths Si <H> Tj and SV ■(-> Tj' fail, 
and they are both protected by Pfe. Consider the sum of the 
signals received by node Si over and TV Similar to our 
discussion in IIII-CI we can observe that 

Vcr-^SO^Si + z r~ 1 (Si)^Si = <*i'<+j',k(di> + Uj,) + eti^j^Uj 

Note that the structure of the equation allows the node Si to 
treat (dy + u.i> ) as a single unknown. Thus from protection 
path Pk, node Si obtains one equation in two variables. Now, 
if there exists another protection path P; that also protects the 
connections Si <-> Tj and SV -o- Tj,, then we can obtain the 
following system of equations in two variables 





(di, + Uj,) 








Uj 




A. 



(9) 



where x s . and x l s . represent values that can be obtained 
at Si and therefore Uj can be recovered by solving the 
system of equations. The choice of the scaling coefficients 
needs to be such that the associated 2x2 matrix in (0 is 
invertible. This can be guaranteed by a careful assignment 
of the scaling coefficients. More generally we shall need to 
ensure that a large number of such matrices need to be full- 
rank. By choosing the operating field size GF(2 m ) to be large 
enough, i.e., m to be large enough we can ensure that such 
an assignment of scaling coefficients always exists 0241 . The 
detailed discussion of coefficient assignment can be found in 
Section [VI] 

C. Conditions for Data Recovery: 

We shall first discuss the conditions for data recovery under 
a certain failure pattern. To facilitate the discussion on deter- 
mining which failures can be recovered from, we represent the 
failed connections, and the protection paths using a bipartite 
graph, G D r{V, E), where the set of vertices F = NUP, and 
the set of edges E C N x P where N is the set of connections 
to be protected, and P is the set of protection paths. There is 
an edge from connection Ni G N to protection path P& £ P if 
Pk protects connection Ni. In addition, each edge has a label 
that is assigned as follows. Suppose that there exists an edge 
between Ni (between nodes Sy and Tj,) and P^. The label 
on the edge is given by the scaling coefficient 

Note that in general one could have link failures on primary 
paths as well as protection paths. Suppose that a failure pattern 
is specified as a set F = {N^ , . . . N in } U {P^ , . . . , P jn , } 
where {N^ , . . . Ni n } denotes the set of primary paths that 
have failed and {Pj 1 , ■ ■ ■ , Pj , } denotes the set of protection 
paths that have failed. The determination of whether a given 
node can recover from the failures in F can be performed in 
the following manner. 

1) Initialization. Form the graph Gdr{V, E) as explained 
above. 

2) Edge pruning. 
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Path 1 

Path 2 

Path 3 

Fig. 5. An example of a network protected against multiple faults. 

a) For all connections Ni eN\f remove Ni and all 
edges in which it participates from Gdr- 

b) For all protection paths G F remove and all 
edges in which it participates from Gdr- 

3) Checking the system of equations. Let the residual graph 
be denoted G DR — (N UP ,E ). For each connection 
Ni G N , do the following steps. 

a) Let the subset of nodes in P that have a connection 
to Ni be denoted N(Ni). Each node in Af(Ni) 
corresponds to a linear equation that is available 
to the nodes participating in Ni. The linear com- 
bination coefficients are determined by the labels 
of the edges. Identify this system of equations. 

b) Check to see whether a node in N can solve this 
system of equations to obtain the data unit it is 
interested in. 

In Figure [3 we show an example that applies to the 
network in Figure [5] Figure |U(a) shows the bipartite graph 
for the entire network, while Figures 13(b) and 13(c) show 
the graph corresponding to the following two failing patterns, 
respectively: 

. (S 2 ,T 2 ), (5 6 ,T 6 ) and (5 5 ,T 5 ) 

. P 2 , (5 2 ,T 2 ) and (5 6 ,T 6 ) 
Let us assume that the encoding coefficients are chosen to 
make sure the equation obtained by each node has unique so- 
lution. From Figure |6](b), the failures of connections (5 2 ,T 2 ) 
and (5e, T§) can be recovered from because each node obtains 
two equations in two unknowns. More specifically, at node 
5 2 we obtain the following system of equations (the equation 
from Pi is not used). 





U 2 




It 2 1 
X S 2 




(d 6 + u 6 )_ 




. x s 2 . 



which has a unique solution if (a 2 o2,2Q!6<-i-6.3 — 
Q ; 2-h-2,3Q;6-h-6,2) 7^ 0. As pointed out in Section IV-BI 
the choice of the scaling coefficients can be made so that all 
possible matrices involved have full rank by working over 
a large enough field size. Thus in this case 5 2 and T 2 can 
recover from the failures. By a similar argument we can 




(a) (b) (c) 

Fig. 6. Applying the bipartite graph representation verify if failures will be 
recovered. 



observe that 5g and Tq can also recover from the failures by 
using the equations from P 2 and P3. However, S5 and T5 
cannot recover from the failure since they can only obtain 
one equation from Pi in two variables that corresponds to 
failures on (5 2 ,T 2 ) and (5 5 ,T 5 ). In Figure 0(c), path P 2 
does not exist, and (Sq,T§) is protected only by path P3, 
which protects two failed connections. Therefore, it cannot 
recover from the failure. However, (5 2 , T 2 ) can still recover 
its data units by using path Pi. 

In general, this procedure needs to be performed for every 
possible failure pattern that needs to be protected against, for 
checking whether all nodes can still recover the data unit 
that they are interested in. However, usually the set of failure 
patterns to be protected against is the set of all single link 
failures or more generally the set of all possible M > 1 
link failures. Those M link failures can happen anywhere, 
on primary paths or protection paths. 

Next, we consider general conditions for data recovery. 
First, we describe the general model for multiple failures. 
In order to make expressions simple, we assume that the 
data unit obtained by a node of a failed connection, say 
Si, from protection path Pk is the sum of the data units 
from Sfc, Tfc. Adding up with a^j.kdi, which is the data 
units generated at node Si, we denote this sum by pk where 

Pk = V<T-i-(Si)-*Si + z T-t(Si)^s, + ..;></<• Note that d t is 
the local data units, which is always available. In this case, 
each node on one protection path Pk obtains the same equation 
in terms of the same variables. By denoting the set of failed 
primary connections protected by P^ as F(Pf.), the equation 
for this protection path P& is 

2J an^j t k{di +Uj) =pk- (10) 

(S^T^eFfP*,) 

In equation ( fTol ). each + Uj is considered as one variable 
and the coefficients assigned to di and Uj are the same. Each 
node of a failed connection will obtain one equation from each 
intact protection path that protects it and consequently forms 
a system of linear equations. The number of equations that 
node Si obtains is the number of intact protection paths that 
protect Si. The number of variables is the total number of 
failed connections protected by the protection paths that also 
provide protection to the failed connection between 5, and Tj . 



Si needs to solve the system of equations and obtain d{ + Uj. 
By subtracting di, it can get uj, which is the data unit Si wants 
to receive while Tj can retrieve the data di by subtracting Uj 
from di + uj . 

Each protection path maps to an equation in terms of a 
number of variables representing the combination of the data 
units generated at two end nodes of the failed connections 
protected by this path. We can form a system of equations that 
consists of at most K equations like equation ([Tol l where K is 
the total number of protection paths. Each failure of a primary 
path introduces a variable whereas each failure occurring on 
a protection path erases the corresponding equation from the 
matrix. In general, the system of equations that a node obtains 
also depends on the topology. If all of the connections are not 
protected by the same protection paths, there are zeros in the 
coefficient matrix because a failed connection is not protected 
by all protection paths, implying that some variables will not 
appear in all equations. 

In order to recover from any failure pattern of M failures, 
we require the following necessary conditions. 

Theorem 1: In order for the network to be guaranteed 
protection against any M link failures, the following necessary 
conditions should be satisfied. 

1) Each node should be protected by at least M link- 
disjoint protection paths. 

2) Under any failure pattern with M failures, a subset of 
equations that each node obtains should have a unique 
solution. 

proof: The first condition can be shown by contradiction. If 
a node is protected by M — 1 protection paths, the failure 
could happen on these M — 1 protection paths and on the 
primary path in which this node participates. Then, this node 
does not have any protection path to recover from its primary 
path failure. 

The second condition is to ensure that each node can recover 
the data unit under any failure pattern with M failures. Note 
that for necessary condition, we don't require that the whole 
system of equations each node obtains has unique solution 
because one node is only interested in recovering the data unit 
sent to it. As long as it can solve a subset of the equations, it 
recovers from its failure. ■ 

We emphasize that the structure of the equations depends 
heavily on the network topology, the connections provisioned 
and the protection paths. Therefore it is hard to state a more 
specific result about the conditions under which protection 
is guaranteed. However, under certain structured topologies it 
may be possible to provide a characterization of the conditions 
that can be checked without having to verify each possible 
system of equations. 

For example, if all connections are protected by M protec- 
tion paths, it is easy to see the sufficient condition for data 
recovery from any M failures is that the coefficient matrix of 
the system of equations each node obtains under any failure 
pattern with M failures has full rank. As will be shown next, 
our coefficient assignment methods are such that the sufficient 
conditions above hold. 



Next we construct a KxN matrix to facilitate the discussion 
of coefficient assignment. According to the encoding protocol, 
each connection Si~Tj has coefficient a^j k f° r encoding on 
Pfc. In general, there are at most KxN coefficients for a net- 
work with N primary paths Si 1 O Tj 1 , Si 2 <H> Tj 2 , . . . , S^ O 
Tj t , , . , , Si N <r> Tj N and K protection paths Pi, P2, . . . , Pr ■ 
We form a KxN matrix A where Am if ft, ^T, h 

is protected by P&, Am = otherwise. Here, I is the index 
for primary paths and each column of A corresponds to a 
primary path. Each row of A corresponds to a protection path. 
This matrix contains all encoding coefficients and some zeros 
induced by the topology in general. It is easy to see that under 
any failure pattern, the coefficient matrix of the system of 
equations at any node of any failed connection is a submatrix 
of matrix A. We require these submatrices of A to have full 
rank. We shall discuss the construction of A, i.e., assign proper 
coefficients in Section [V71 

VI. Encoding coefficient assignment 

In this section, we shall discuss encoding coefficient as- 
signment strategies for the proposed network coding schemes, 
i.e., construct A properly. Under certain assumptions on the 
topology, two special matrix based assignments can provide 
tight field size bound and efficient decoding algorithms. We 
shall also introduce matrix completion method for general 
topologies. 

Note that the coefficient assignment is done before the ac- 
tual transmission. Once the coefficients have been determined, 
during data transmission they need not be changed. Thus, 
for the schemes that guarantee successful recovery with high 
probability, we can keep generating the matrix A until the full 
rank condition discussed at the end of the previous section 
satisfies. This only needs to be done once. After that, during 
the actual transmission, the recovery is successful for sure. 

A. Special matrix based assignment 

In this and the next subsection, we assume that all primary 
paths are protected by the same protection paths. This implies 
that matrix A only consists of encoding coefficients. It does 
not contain zeros induced by the topology. Thus, we can let 
A to be a matrix with some special structures such that any 
submatrix of A has full rank. The network will be able to 
recover from any failure pattern with M (or less) failures. 
Without loss of generality, we shall focus on the case when 
M = K, where K is the number of protection paths. If M 
failures happen, in which t\ failures happen on primary paths, 
each node will get M — (M — ii) = t\ equations with t\ 
unknowns corresponding to t\ primary path failures. The t\ x 
t\ coefficient matrix is a square submatrix of A and they are 
the same for each node under one failure pattern. 

First, we shall show a Vandermonde matrix-based coeffi- 
cient assignment. It requires the field size to be q > N. If 
all failures happen on primary paths, the recovery at each 
node is guaranteed. In this assignment strategy, we pick up 
N distinct elements from GF(q): Ai,...,Ajv and assign 
them to each primary paths. At nodes Si, and Tj t , Af _1 
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is used as encoding coefficient on protection path P/., i.e., 
■Au = c&i^j^k = Af _1 . In other words, A is a Vandermonde 
matrix 11261 Section 6.1]: 



1 


1 


1 


Ai 


A 2 • 


Ajy 


A? 


A| • 


X 2 


A'-l 
1 


Af- 1 ■ 





Suppose AI failures happen on primary paths, the indices of 
failed connections are ei, . . . , ent, every node gets a system 
of linear equations with coefficient matrix having this form: 



1 


1 


1 


Aei 
\2 
A ei 


Ae 2 


A ejlf 
X 2 


M-l 


\M-1 
A e 2 




ei 



This matrix is a M x Af Vandermonde matrix. As long as 
A ei , A e2 , . . . , A ej/ are distinct, this matrix is invertible and Si ei 
can recover Uj e . We choose Ai, . . . , Xn to be distinct so that 
the submatrix formed by any M columns of A has full rank. 
The smallest field size we need is the number of connections 
we want to protect, i.e., q > N. Moreover, the complexity of 
solving linear equation with Vandermonde coefficient matrix is 
0(M 2 ) |fl9ll . Thus, we have a more efficient decoding because 
if the coefficients are arbitrarily chosen, even if it is solvable, 
the complexity of Gaussian elimination is 0(Af 3 ). 

If M — t\ failures happen on protection paths, we require 
that any t\ x t\ square submatrix formed by choosing any 
ii columns and t\ rows from A has full rank. Although the 
chance is large, the Vandermonde matrix can not guarantee 
this for sure |20l p.323,problem 71. 1221 , 1231 . We shall propose 
another special matrix to guarantee that for combined failures, 
the recovery is successful at the expense of a slightly larger 
field size compared to Vandermonde matrix assignment. 

In order to achieve this goal, we resort to Cauchy matrix 
1201 . of which any square submatrix has full rank if the entries 
are chosen carefully. 

Definition 2: Let {x\, . . . , x mi }, {y\, . . . , y„ l2 } be two sets 
of elements in a field F such that 

(i) Xi + yj^O, Vie {l,...,mi} Vj e {1, . . . , m 2 }; 

(ii) Vi,j e {l,...,mi},i ^ j : x t ^ Xj and Vi,j £ 
{l,...,m 2 },i^j :yi j^yj. 

The matrix C = (cy) where c%j = + yj) is called a 

Cauchy matrix. 

If mi = m2, the Cauchy matrix becomes square and its 
determinant is ll20"l : 

, urr , ril<i< J <m 1 ( x 3 ~ - Vi) 

det(c) = ~^rr — Ix~tv~) — 

Note that in GF(q) where q is some power of 2, the 
addition and subtraction are equivalent. Therefore, as long 
as xi, ... , x mi , yi, ... , y mi are distinct, Cauchy matrix has 
full rank and its any square submatrix is also a Cauchy 
matrix (by definition) with full rank. For our protection 



problem, we let matrix A to be a K x N Cauchy matrix. 
{xi, . . . , Xk }, {yi, ■ ■ ■ , Vn} are chosen to be distinct. Thus, 
the smallest field size we need is K + N. Suppose there are 
ti failures on primary paths and M — 1\ failures on protection 
paths, the coefficient matrix of the system of equations ob- 
tained by a node is a t\ x t\ submatrix of A. It is still a Cauchy 
matrix by definition and invertible. Thus, the network can be 
recovered from any M failures. Moreover, the inversion can 
be done in 0(t\) (21], which provides an efficient decoding 
algorithm. 

B. Random assignment 

We could also choose the coefficients from a large finite 
field. More specifically, we have the following claim 1271 . 

Claim 3: When all coefficients are randomly, independently 
and uniformly chosen from GF(q), the probability that a t\- 
by-ii matrix has full rank is p(ti) = 11*^(1 - l/q l ), 1 < 
ii < M. 

Under one failure pattern with ii failures on the primary 
paths and M — t\ failures on the protection paths, every 
failed connection obtains the equations that have the same 
ti-by-^i coefficient matrix. The probability that it is full 
rank is p(ti) and it goes to 1 when q is large. Note that 
there are $2 4l=1 (^Ha/^) possible failure patterns when 
the total number of failures is M. Thus, by union bound, the 
probability of successful recovery under any failure pattern 
with M failures is 1 - £ t A f =1 (£) ( M M J (1 - pfa)), and it 
approaches 1 as g increases. 

C. Matrix completion for general topology 

If the primary paths are protected by different protection 
paths, like in Figure |5j there are some zeros in A induced 
by the topology. We want to choose encoding coefficients so 
that under every failure pattern with M or less failures, the 
coefficient matrix of the system of equations obtained by every 
node is invertible. We can view the encoding coefficients in A 
as indeterminates to be decided. The matrices we require to 
have full rank are a collection C4 of submatrices of A, where 
Ca depends on the failure patterns and the network topology. 
Each matrix in C4 consists of some indeterminates and some 
zeros. The problem of choosing encoding coefficients can be 
solved by matrix completion ll24ll . A simultaneous max-rank 
completion of C4 is an assignment of values from GF{q) 
to the indeterminates that preserves the rank of all matrices 
in C_4. After completion, each matrix will have the maximum 
possible rank. Matrix completion can be done by deterministic 
algorithms l24l . Moreover, simply choosing a completion 
at random from a sufficiently large field can achieve the 
maximum rank with high probability ll25ll . Hence, we can 
choose encoding coefficients randomly from a large field. 

VII. ILP Formulation for Single-link Failure 

The problem of provisioning the working paths and their 
protection paths in a random graph is a hard problem. This 
is due to the fact that the problem of finding link disjoint 
paths between multiple pairs of nodes in a graph is known to 
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(a) (b) 
M={(0,3), (5,2)} 

Fig. 7. An example to show: (a) the graph G in solid line and its modified 
graph G'\ (b) the provisioning of the connections ((0-3) and (5-3-2)) and 
their protection path (s-5-0-2-l-3-t), where the two links (s-5) and (3-t) are 
not included in the cost of the protection circuit. 

be NP-complete ifTTl . Therefore, in this section we formulate 
an integer linear program that optimally provisions a set of 
unicast connections, and their protection paths against single- 
link failure. The optimality criterion is the minimization of the 
sum of the working and protection resources. 

The problem can be stated as follows: Given an bidi- 
rectional graph G = (V, E) and a traffic demand matrix 
of unicast connections, N, establish a connection for each 
bidirectional traffic request j G N, and a number of protection 
paths that travel all the end nodes of the connections in N, 
defined by set C, such that: 

• A path protecting a connection must pass through the end 
nodes of the connection. 

• The connections jointly protected by the same path must 
be mutually link disjoint, and also link disjoint from the 
protection path. 

• The total number of edges used for both working and 
protection paths is minimum. 

We also assume that the network is uncapacitated. 

In order to formulate this problem, we modify the graph G 
to obtain the graph G' by adding a hypothetical source s and a 
hypothetical sink t. We also add a directed edge from s to each 
node v, where v G C, as well as a directed edge from each 
such node v to t. An example is shown in Figure [7] Figure 
[7] (a) shows a graph G with six nodes and ten bidirectional 
edges and the corresponding modification to the graph G' 
given two traffic requests N = {(0, 3), (5, 2)}. Figure 0(b) 
shows the provisioning of the two connections in N and their 
protection path from s to t. Therefore, the problem of finding 
the protection paths turns out to be establishing connections 
from node s to t that traverse all the nodes v G C. For each 
subset of connections that are protected together, the two ends 
nodes of these traffic requests have to be traversed by the same 
protection path. 

This disjoint paths routing problem can be formulated with 
ILP as follow: (Note that G = (V, E) and G' = (V',E') 
denote the original and modified graph in the formulation). It 



is to be noted that the number of protection paths must satisfy: 

1 < number of protection paths < N. 

We may have more than one protection path because it 
is possible that the primary connections are partitioned into 
several sets and each set of primary connections share the 
protection of path. However, the worst case is that each 
primary path requires a unique protection path (the case of 1+1 
protection), which results in a total of N protection paths. In 
the formulation, therefore, we have a maximum of 2N paths: 

• Connections indexed from 1 to N are the ones given by 
the set N, and these should be provisioned in the network. 

• Connections indexed from N+l to 2N are hypothetical 
connections, which correspond to protection connections, 
and at least one of them should be provisioned. 

The ILP is formulated as a network flow problem, where 
there is a flow of one unit between each pair of end nodes of 
a connection, and there is also a flow of one unit from s to t 
for each protection path. 

We define the following parameters, which are input to the 
ILP: 

G(V,E): the original network graph 
G'(V',E'): the modified graph 
N: the set of unicast connections 

c mn : a constant, the cost of link (m, n) G E 

vf set of end nodes of connection j in N, 

Vj = {sj,tj}, which are different notations 
from the previous definition of a connection, 
denoted by Si, Tj where i, j are the indices 
for the nodes. 

We also define the following binary variables which are 
computed by the ILP: 

f! mn binary, equals 1 if the protection path i traverses 

link (m, n) in G 
Zp m integer, the number of times that the node m G V 

is traversed by path i 
Uj binary, equals 1 if connection j is protected by 
path i 

pl nn binary, equals 1 if the working flow of j traverses 
link (to, n) G G 

q 3 mn binary, equals 1 if the protection flow of j tra- 
verses link (m, n) G G 

Zpi n integer, the number of times that node to G V is 
traversed by the working flow of j 

Zq 3 m integer, the number of times that node to G V is 
traversed by the protection flow of j 

The objective function is: 

Minimize: ^ ( ^ 

{m,n)EE l<j<N N<i<2N 

The objective function minimizes the total cost of links used 
by the working paths (first term) and by the protection paths 
(second term). Note that a protection path at s and end at t in 
the modified graph, G", but we only consider the cost of links 
in the original graph G. 
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The constraints are such that: 

1) Working Flow Conservation: 

E <n = l> J<N; (11) 

{n:(s D ,n)eE} 
{n:(m,n)e.E} 

The constraints ( ITTb and ( fT2b are standard flow conserva- 
tion for working traffic which ensures that a bidirectional 
path is established between end nodes Sj and tj of 
connection j. 

2) Protection Flow Conservation: 



For Vj < AT, JV < i < 2iV : 

E <« = 1; (13 > 

{n:(sj,n)£E} 

E d« = 2Ztf m , Vm e (W) 

{n:(m,n)£E} 

Constraints (JT3J and dT~4T > make sure that each connection 
j has a protection flow. 

E tin < i; d5) 

)n:(s,n)eE') 

E /L = 2Zfl, Vm e V; (16) 

{n:(m,n)£-E} 

The flow conservation of protection paths is ensured by 
constraints < fT~5T > and (fT6] l. It is worth noting that not 
every protection path i (N<i<2N) is required unless it 
is used for protection. 

E r : 1: " 7 » 

7V<i<2Af 

^E^ E fL; as) 

j<N {n:(s,n)£E>} 

f^ n >qLn + U}-l,y(m,n)eE; (19) 

Each working flow should be protected by exactly one 
protection path, guaranteed by constraint ( fTTI i. Mean- 
while, any protection path i is provisioned only if it is 
used to protect any working path j. Otherwise, we do 
not need to provision it. Therefore, equation ( fTST l ensures 
this constraint. Furthermore, constraint ( fl9b ensures that 
if a protection path i protects connection j, it should 
traverse the same links used by the protection flow q 3 mn - 
3) Protection Path Sharing: 



For V(m, n) 6 E, N < i < 2N : 

PLn + 4nn<h Vj < N] (20) 
l4n + /m« + ^<2, Vj<iV; (21) 
pL, + ?L + U) + Ul<Z 1 Vj<fc< iV.(22) 

The working flow and protection flow of each connec- 
tion j should be link disjoint, reflected by constraint 
( 1201 ). Each protection path may protect multiple connec- 
tions so that it needs to traverse multiple corresponding 
protection flows. Thus, each protection path should also 
be link disjoint to all the working flow it protects. This 
constraint is ensured by equation ( f2Tb . Meanwhile, if 
two connections are protected by the same path /, their 
working flow should also be link disjoint such that 
codewords can be decodes at each end nodes through 
the protection path. The last constraint is guaranteed by 
equation d22l . 

The total number of variables used in the ILP is (3iV|V| + 
3N\E\ +N 2 ) and the total number of constraints is (6JV|V| + 
2N + 2N 2 \E\ + N\E\ + N 2 (N - l)\E\), which is dominated 
by (3(iV 3 |£;|). 

VIII. Numerical Results 

This section presents numerical results of the cost of our 
proposed protection scheme and compares it to 1+1 protection 
and Shared Backup Path Protection (SBPP) in terms of total 
resource requirements for protection against single-link failure. 
SBPP has been proven to be the most capacity efficient 
protection scheme and can achieve optimal solutions lfT2l . 
However, it is also a reactive protection mechanism and takes 
time to detect, localize and recover from failures. We consider 
two realistic network topologies, NSFNET and COST239, 
as shown in Fig. [8] and |9l respectively. Both networks are 
bidirectional and each bidirectional span e has a cost c e , 
which equals the actual distance in kilometers between two 
end nodes. 

We first compare three schemes in terms of the total con- 
nection and protection provisioning cost in both networks as 
shown in Fig.fTOlandfTTl respectively. We obtained the results 
by formulating the problems as ILPs using three different ap- 
proaches. The x-axis denotes the number of connections in the 
static traffic matrix and y-axis denotes the total network design 
cost. Each value is the average cost over ten independent cases 
and all approaches used identical traffic requests for each case. 

Since SBPP is the most capacity efficient scheme, it 
achieves the minimum cost. 1+N approach uses much lower 
cost than 1+1, but is higher than SBPP in both networks. 
We express the extra cost ratio of a scheme over SBPP by: 
(Costscheme — C ostsBPp) / C ostsBPP ■ The extra cost ratio 
of 1+N in NSFNET increases from 5.2% to 23% as the number 
of connections increases from 2 to 7. Meanwhile, the extra cost 
ratio of 1+1 over SBPP increases from 12% to 45%, which is 
almost twice that of 1+N at each case. The advantage of 1+N 
over 1+1 in COST239 is even more significant than NSFNET 
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Fig. 9. COST239 (N=ll, E=26) 
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Fig. 10. Comparison of total cost in NSFNET 
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Fig. 1 1 . Comparison of total cost in COST239 network 



due to the larger average nodal degree, 4.6, compared to, 3, in 
NSFNET. Hence, there is a higher chance for multiple primary 
paths to share the same protection path, which results in lower 
overall cost. Based on the results, we can observe that the extra 
cost ratio of 1+N over SBPP in COST239 increases from 1.8% 
to 11.1% whereas the ratio of 1+1 over SBPP increases from 
10.2% to 38%, as the number of connections increases from 
2 to 7. Actually, the cost of using 1+N is very close to the 
optimal in COST239 network. The extra cost required by 1+N 
over the optimal solution is less than 27% of that achieved by 
1+1 scheme. 

In fact, if we only consider the cost of protection, i.e. 
exclude the cost of connection provisioning, 1+N protection 
uses much lower resources than 1+1 protection. For example, 
by examining one network scenario where there are seven 
connections in COST239 network, the average protection cost 
of using SBPP, 1+N and 1+1 protection schemes is 3586.0, 
4313.5 and 6441.5, respectively. The saving ratio of 1+N to 
1+1 is around 33%, which is higher than the saving ratio of 
joint capacity cost (19.3%). This example further illustrates 
the cost saving advantages of using 1+N protection over 1+1 
protection. 

In summary, 1+N protection has a traffic recovery speed 
which is comparable 1+1 protection. However, it performs 
significantly better than 1+1 scheme in terms of protection 
cost. Compared with the most capacity efficient protection 
scheme, SBPP, 1+N protection performs close to SBPP in 
terms of total capacity cost in dense networks. However, SBPP 
takes much longer to recover from failures due to the long 
switch reconfiguration time and traffic rerouting, which are 
not required in 1+N protection. 

IX. Conclusions 

This paper has introduced a resource efficient, and a fast 
method for providing protection for a group of connections 
such that a second copy of each data unit transmitted on 
the working circuits can be recovered without the detection 
of the failure, or rerouting data. This is done by linearly 
combining the data units using the technique of network 
coding, and transmitting these combinations on a shared set 
of protection circuits in two opposite directions. The reduced 
number of resources is due to the sharing of the protection 
circuit to transmit linear combinations of data units from 
multiple sources. The coding is the key to the instantaneous 
recovery of the information. This provides protection against 
any single link failure on any of the working circuits. The 
paper also generalized this technique to provide protection 
against multiple link failures. 

The method introduced in this paper improves the technique 
introduced in lfl5l and 03). In particular, (a) it requires fewer 
protection resources, and (b) it implements coding using a 
simpler synchronization strategy. A cost comparison study of 
providing protection against single link failures has shown 
that the proposed technique introduces a significant saving 
over typical protection schemes, such as 1+1 protection, while 
achieving a comparable speed of recovery. The numerical 
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results also show that the cost of our 1+N scheme is close 
to SBPP, the most capacity efficient protection scheme. How- 
ever, the proposed scheme in our paper provides much faster 
recovery than SBPP. 
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